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^ I Recent studies have found that the oldest and most luminous galaxies in the early 

1—5 I Universe are surprisingly compact/^ having stellar masses similar to present-day 

^ [ elliptical galaxies but much smaller sizes. This finding has attracted considerable 

I attention'^^'^ as it suggests that massive galaxies have grown by a factor of ~ five in 

Q I size over the past ten billion years. A key test of these results is a determination 

^ ! of the stellar kinematics of one of the compact galaxies: if the sizes of these objects 

^ ! are as extreme as has been claimed, their stars are expected to have much higher 

^! velocities than those in present-day galaxies of the same mass. Here we report a 

2 ■ measurement of the stellar velocity dispersion of a massive compact galaxy at red- 

. shift z = 2.186, corresponding to a look-back time of 10.7 billion years. The velocity 
'— dispersion is very high at 510lgg^kms^, consistent with the mass and compact- 

T— I ■ ness of the galaxy inferred from photometric data and indicating significant recent 

^ ■ structural and dynamical evolution of massive galaxies. The uncertainty in the dis- 
persion was determined from simulations which include the effects of noise and 

■ template mismatch. However, we caution that some subtle systematic effect may 

\0 • influence the analysis given the low signal-to-noise ratio of our spectrum. 

O ■ We observed the galaxy, dubbed 1255-0, with the Gemini Near-Infrared Spectro- 

> . graph (GNIRS) on the Gemini South telescope for a total of 29 hrs. The de-redshifted 

^ ■ spectrum is shown in Fig. la. A detailed description of the observations and reduc- 

ed ■ tion, as well as an analysis of the continuum emission and detected (weak) emission 

lines, is presented in a companion paper. In the companion paper we derive a stellar 
mass of ~ 2.0 x 10^^ for a Kroupa initial mass function, by fitting stellar popula- 
tion synthesis models to the broad band photometry and the GNIRS spectrum. The 
effective radius of 1255-0 rg = 0.78 ± 0.17kpc, as previously measured^ from deep 
Hubble Space Telescope (HST) NICMOS2 observations. The galaxy was selected from 
a well-studied^^'^^''^ sample of nine spectroscopically-confirmed galaxies with evolved 
stellar populations at z ~ 2.3, and its properties are similar to those of other galaxies 
in this sample. The median stellar mass of the nine objects is 1.7 x 10^^ M© and their 
median effective radius r^, = 0.9 kpc,^ a factor of ~ 5 smaller than galaxies with similar 
masses at 2; = 0. The number density of these massive compact galaxies is substantial. 
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about the same as that of galaxies in the nearby Universe that are a factor of 2-3 more 
massive. ^° 

In the present study we use the deep Gemini spectrum to measure the stellar ve- 
locity dispersion of the galaxy, by employing standard techniques for measuring the 
broadening of the absorption lines. ^^'^^ Our methodology is explained in detail in the 
Supplementary Information. Briefly, smoothed model spectra were fitted to the data in 
real space, taking observational errors into account and ignoring data with the largest 
uncertainties. The uncertainty in the dispersion was determined from Monte Carlo 
simulations of many different combinations of assumed velocity dispersions and em- 
pirical realizations of the noise. Systematic uncertainties were assessed by varying the 
templates (also allowing for multiple components), the masking and weighting, and 
the continuum filtering, and are typically much smaller than the random uncertainty. 
We note that the spectrum is available in electronic form as a Supplementary Dataset. 

We derive a velocity dispersion a — SlOlgg^kms"^ for the galaxy, which is very 
high when compared to typical early-type galaxies in the nearby Universe. Although 
not statistically significant, it is striking that the best-fit value exceeds the measured 
dispersions of all individual galaxies in the Sloan Digital Sky Survey (SDSS).^^'^° In the 
SDSS a significant fraction of galaxies with velocity dispersions in excess of 350 km s~^ 
are superpositions, which are easily identified with HST imaging. ^° As shown in Fig. 
Ib-d, 1255-0 is a single, nearly round object with an effective radius of ~ 0.1 arcsec 
in HST images. The dispersion is also a factor of ~ 2 higher than a previous mea- 
surement^^ from a stacked spectrum of 13 galaxies at (z) = 1.6. A direct comparison 
is difficult given the uncertainties associated with stacking individual spectra, but we 
note that the median stellar mass of the 13 galaxies is a factor of ~ 3 smaller than that 
of 1255-0 and the median effective radius is a factor of 1.5 larger. The expected disper- 
sion of these (z) = 1.6 galaxies is therefore a factor of ~ 2 lower than that of 1255-0, 
and the two results are consistent. 

The high dispersion of 1255-0 confirms that the galaxy is very massive despite its 
diminutive size. The relation between mass, velocity dispersion, and size can be ex- 
pressed as Mdyn = Ca'^Tg, with C a constant that depends on the structure of the galaxy 
and other parameters. Using logC = 5.87, which is the value that gives Mdyn ~ Mgtar 
for galaxies in the SDSS,^ we find Mdyn = I-SIqI x 10" Mq. For log C = 6.07, the value 
derived from kinematic data of present-day early-type galaxies,^^ the dynamical mass 
is 2.4toJ X 10^^ Mq. Both estimates are in excellent agreement with the stellar mass 
(Fig. 2a). Put differently, the high dispersion that we measure was expected (and in fact 
predicted^'^°) given our extreme size and stellar mass measurements. Quantitatively, 
the expected dispersion assuming Mdyn = Mgtar and 5.87 < log C < 6.07 is in the range 
470 km s-^ -590 km s-^ 

At the same time, the high dispersion confirms and extends the notion that qui- 
escent galaxies at high redshift are structurally and dynamically very different from 
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galaxies in the present-day Universe. Figures 2b-d show where 1255-0 falls with re- 
spect to the relations between velocity dispersion, size, and dynamical mass defined 
by SDSS galaxies. The galaxy is offset from the local relations, consistent with previous 
studies which were based on stellar masses derived from photometric data.^'^ At fixed 
dynamical mass the dispersion is higher by a factor of ~ 2.5 and the effective radius is 
smaller by a factor of ~ 6. The most dramatic offset is in the log a — log re plane (Fig. 
2b). These two quantities are measured directly and independently, and (to first order) 
do not depend on stellar populations. 

The extreme compactness of massive high redshift galaxies is qualitatively consis- 
tent with models in which the central parts of massive galaxies form early in highly 
dissipative processes,^^ although it remains to be seen whether such models can pro- 
duce objects with the size and velocity dispersion of 1255-0. In particular, it may be 
difficult to funnel gas clumps into an extremely compact configuration without form- 
ing stars at larger radii. Regardless of the details of the model, in its star-forming phase 
at z > 3.5 the galaxy likely had a very compact molecular gas distribution with a rota- 
tion velocity of ~ 700kms~^. The median rotation velocity of CO in submm galaxies 
at z = 2 - 3.5 has been found^^ to be ^ 470 km s"^ (assuming V^ot = 0.6 x FWHM) — 
a high value by most standards, but still somewhat lower than what we expect for the 
progenitors of galaxies such as 1255-0. There is not yet much information on the gas 
dynamics of massive galaxies at redshifts z > 3.5. The z = 6.4 quasar SDSS Jl 148+5251 
has a relatively small CO linewidth of Vrot ~ 170 km s^^^"^ but it may be that quasars 
are biased low because their gas disks are preferentially seen face-on. It is obviously 
not clear whether the gas was ever in a regular disk; it would be interesting to deter- 
mine whether 1255-0 shows rotation, but that requires imaging (or spectroscopy) of 
higher spatial resolution than is currently available. 

A problem that is perhaps even more vexing than the origin of galaxies such as 
1255-0 is their subsequent evolution onto the local relations between size, velocity dis- 
persion, and mass. The simplest explanation is that the mass and/or size measure- 
ments of the compact galaxies are incorrect,^' but this is difficult to maintain given 
the dynamical measurement presented here. We are left with the conclusion that very 
significant structural and dynamical changes are required to bring massive, quiescent 
high redshift galaxies to the local relations. This cannot easily be achieved through 
star formation as the compact high redshift galaxies already appear to have stopped 
forming new stars, consistent with the old ages inferred for the stars in today's most 
massive galaxies. Among the models that have been proposed^^^ minor mergers may 
be the most effective single mechanism, as simple virial arguments suggest that the 
velocity dispersion changes by a factor of /r"^''^ for a factor of fr change in radius.^^'^^ 
However, it is an open question whether mergers alone can "puff up" galaxies by the 
required amount, as the precise effect depends on the accretion rate, the masses, orbits, 
and gas content of accreted galaxies, angular momentum transfer between stars and 
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dark matter, and on possible evolution in the contribution of dark matter to the mea- 
sured kinematics.^^ Finally, we note that evolution in the velocity dispersion of galaxies 
would trivially imply evolution in the black hole mass - a relation,^^'^^ such that black 
hole masses are lower at fixed a at high redshift. 

While confirming that the velocity dispersions of compact galaxies are high, our 
measurement is obviously not sufficiently accurate to properly characterize the evolu- 
tion of the relations in Fig. 2. A la error of 25 % in the velocity dispersion implies an 
error of 56 % in the dynamical mass, and further progress requires dispersions with un- 
certainties < 10 % for much larger samples. New spectrographs being readied for use 
on 8m class telescopes, combined with new wide field imaging surveys that can pro- 
vide sufficiently bright targets, are expected to revolutionize this field in the next few 
years. As indicated here, such observations are crucial for calibrating stellar masses 
at high redshift and for measuring the structural and dynamical evolution of massive 
galaxies from the time that their star formation was quenched to the present. 
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Figure 1 | Spectrum and HST images of 1255-0 at z = 2.186. a. Spectrum that was used 
to measure the velocity dispersion. Light grey shows the spectrum at a resolution of 5 A (r^ 
100 km s"^), which was used for the actual measurement. A smoothed version of the same 
data (using a 25 A boxcar filter) is shown in black. Regions around detected emission lines 
are shown in orange and were excluded from the fits. The most prominent absorption lines 
are H/? at A4861 A and Mg at A5172A. The best-fitting stellar population synthesis model/^ 
smoothed to the best-fitting velocity dispersion, is shown in red. The inset shows the results 
of Monte Carlo simulations to determine the uncertainty in the best-fitting velocity dispersion. 
The curves show how often a dispersion of 510 km s^^ is measured given the true dispersion 
and noise. The two curves are for two different methods of simulating noise: shuffling the 
residuals of the fit in the wavelength direction (blue curve), and extracting "empty" ID spectra 
from the 2D spectrum (red curve), b-d. The HST NICMOS2 image of the galaxy in the Hiqq 
filter, the best-fitting model of the galaxy (with the effective radius indicated in red), and the 
residual obtained by subtracting the model from the data. The galaxy is a single, very compact 
object with an effective radius of 0.78 kpc. Its coordinates are a = 12^^54™ 59. 6*^, 6 = +01°11™30'' 
(J2000), its K band observed magnitude is 19.26 (Vega) and its R band observed magnitude is 
24.98 (Vega).^^ Alternative names that have been used for this object are 1256-151^^ and 1256- 

03,16 
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Figure 2 [ Properties of 1255-0 compared to nearby galaxies, a, Relation between stellar 
mass and dynamical mass. Small symbols are galaxies in the SDSS^ in the redshift range 0.05 — 
0.07, and the large red symbol is galaxy 1255-0 at z = 2.186. Our definition of dynamical 
mass, log Mdyn = 5.87 + 2 log{a) + log(re), leads to a one-to-one correspondence between stellar 
mass and dynamical mass for SDSS galaxies. Despite its small size 1255-0 has a very high mass, 
similar to elliptical galaxies today. The dynamical mass is consistent (within lo") with the stellar 
mass that was estimated^^ from fitting stellar population synthesis models to the photometry, 
b-d. Relations between velocity dispersion, effective radius, and dynamical mass. Note that 
these three panels do not depend on stellar populations (except indirectly through the fact that 
the spectrum and the Hubble Space Telescope image are weighted by luminosity, not mass). It 
is clear that the structure and kinematics of 1255-0 are fundamentally different from those of 
nearby galaxies, and significant evolution is required to bring this object to the local relations. 
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Supplementary Information: Robustness of the velocity dispersion 

The S/N ratio of the spectrum ranges from 5-8 per resolution element between the sky lines, 
which is about a factor of two lower than what is typically used for velocity dispersion mea- 
surements. We are helped by the unusually large rest-frame wavelength coverage of the spec- 
trum, the large value of the dispersion, and the fact that in this case a measurement with a 
1(7 uncertainty of ~ 20 % is already highly informative. However, a concern is that systematic 
effects can dominate when the S/N is low. Here we describe several tests we performed to test 
the robustness of the measurement. 

Templates: The default template is the best-fit to the rest-frame UV - optical SED. This is a 
Solar metallicity model with an age of 2.1 Gyr, an e-folding time r = 0.3 Gyr, and extinction 
Ay = 0.25 mag. We also fitted single stellar populations (SSPs) with a range of ages and metal- 
licities. We note that the linestrength is not fixed but a free parameter in the fit. The youngest 
models that were used have ages of 0.5 Gyr, despite the fact that such models provide a poor 
fit to the overall SED. Composite models were also tried, comprised of a 0.5 Gyr component 
in addition to a maximally-old population. Finally we fitted several stellar spectra that we 
used previously at lower redshifts, even though they span a smaller wavelength range than 
the galaxy spectrum. The velocity dispersion is fairly insensitive to the choice of template, 
varying mostly well within the quoted imcertainty, particularly when the absorption-line red- 
shift is constrained to be within several hundred kms~^ of the emission-line redshift. This is 
illustrated in Supp. Fig. 1, which shows the reduced a function of velocity dispersion for 
the default template and for three examples of alternative templates. 

Fitting region: The S/N ratio of the full spectrum is just sufficient for a dispersion measurement, 
and fitting over smaller wavelength regions gives results that are much less robust. Most of the 
signal is contributed by the observed H band: in the J band the S/N is lower, and in the K 
band the absorption lines are weaker. Fitting to the observed J -I- bands gives a = 480 kms~^ 
and fitting to the H + K bands gives a = 590 kms~^, both with large uncertainties. We illus- 
trate which wavelength regions contribute most to our measurement in Supp. Fig. 2. The only 
rest-frame region that "prefers" a significantly smaller dispersion is just blueward of the 4000 A 
break. This region has relatively low S/N in our spectrum and it is notoriously sensitive to tem- 
plate mismatch. 

Fitting method: The spectrum was fit in real space, weighting by the S/N ratio and disre- 
garding data with the highest noise (mostly the wavelength regions between the atmospheric 
windows). Emission lines were excluded; exluding the H(3 absorption line has no effect on 
the measured dispersion. Template spectra were convolved to the instrumental resolution of 
Cinstr ~ 140 km s~^ (taking their intrinsic resolution into account) and subsequently broadened 
to a finely sampled grid of velocity dispersions. Continuum filtering was done with a 5*^^ order 
polynominal; we verified that the results are insensitive to the order of the fit. The redshift and 
the normalization of the residual of the continuum fit (i.e., the linestrength) are free parameters 
in the fit. Two of us (P.v.D. and M.F.) independently wrote software to fit the spectrum, and our 
results are fully consistent. 

Determination of uncertainty: The quoted uncertainty was determined from Monte Carlo sim- 
ulations. We created a finely sampled grid of "true" dispersions. For each dispersion cxtrue a 
model spectrum was created. Next, 500 realizations of the model spectrum were created by 
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adding realistic noise. These 500 noisy spectra were fit with the same procedure as used for the 
actual data. The template was allowed to vary in the simulations. For ctrue < 510kms~^ the 
probability P(crtrue) is then the fraction of simulations that give a best-fit dispersion in excess 
of 510kms~^, and for atrue > 510kms~^ it is the fraction of simulations that give measured 
dispersions below 510 km s~^. 

The curves in the inset of Fig. la show the results of the simulations for two different ways 
to create realistic noise spectra. The simulations shown by the blue curve randomly redis- 
tributed the residuals from the best fit in the wavelength direction. This approach has the 
advantages that the noise characteristics of the data are exactly maintained and that the contri- 
bution of template mismatch to the residuals is included. We shuffled blocks of 5 pixels (25 A) 
rather than individual pixels to retain the effects of correlated noise. The simulations that gave 
rise to the red curve were created by combining random rows of the two-dimensional spectrum. 
This approach has the advantage that the wavelength dependence of the noise is maintained. 
As shown in Fig. la both approaches give very consistent results. The quoted imcertainty was 
calculated from the blue curve as it is slightly wider than the red curve. Both simulations give 
a slightly larger uncertainty than the formal error from the distribution (shown in Supp. 
Fig. 2). Finally, we note that the true uncertainty may be (even) larger than the quoted un- 
certainty due to some unknown systematic effect; as is well known, systematic effects become 
increasingly important at low S/N ratios. We provide the spectrum in electronic form as a 
Supplementary Dataset. 
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Supplementary Figure 1 | Reduced as a function of velocity dispersion. The black 
curve shows results for the default template, which is the best-fit stellar population synthe- 
sis model to the spectrum and broad-band photometry. There is a well-defined minimum at 
« 500kms~^. The blue, red, and green curves are examples of alternative templates, and 
illustrate the effects of changing the template. The blue curve is for a single-age stellar popu- 
lation with age 2Gyr and metallicity 0.4 x Solar; the red curve is for a 2Gyr population with 
metallicity 2.5 x Solar; and the green curve is for a composite model comprised of a 0.5 Gyr 
old component in addition to a maximally-old compoment. These alternative templates have 
higher minimum values than the default template, as expected. Importantly, the minima 
occur at approximately the same dispersion as for the default template. 
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Supplementary Figure 2 | Analysis of residuals from the fit. a. Models with two different 
velocity dispersions. The red model has the best-fit velocity dispersion of 510 km s^^ and the 
blue model has a dispersion that is a factor of two lower. The differences between the models 
are subtle for any individual feature but are significant when the entire spectrum is considered, 
b. Residuals from the best-fit spectrum, normalized by the expected noise, are shown in grey. 
The residuals are well-behaved, and the reduced for the best-fit is close to 1. The colored 
band shows which of the two models provides the best fit as a function of wavelength. Each 
vertical bar is a median of the nearest 50 datapoints (corresponding to ^ 100 A in the rest- 
frame). The hue (red or blue) indicates which model fits best and the intensity indicates the 
absolute size of the difference between the residual from the low dispersion fit and the residual 
from the high dispersion fit, with white implying that both fits are equally good. With the 
exception of the region just blueward of the 4000 A break the high-dispersion model provides 
a better fit than the low-dispersion model. This demonstrates that our results are not driven by 
a small wavelength region or by other (obvious) wavelength-dependent effects. 



